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We study the distribution of traffic in networks whose users try to min- 
imise their delays by adhering to a simple learning scheme inspired by the 
replicator dynamics of evolutionary game theory. The stable steady states 
of these dynamics coincide with the network's Wardrop equilibria and form 
a convex polytope whose dimension is determined by the network's redun- 
dancy (an important concept which measures the "linear dependence" of the 
users' paths). Despite this abundance of stationary points, the long-term 
behaviour of the replicator dynamics turns out to be remarkably simple: 
every solution orbit converges to a Wardrop equilibrium. 

On the other hand, a major challenge occurs when the users' delays fluc- 
tuate unpredictably due to random external factors. In that case, interior 
equilibria are no longer stationary, but strict equilibria remain stochasti- 
cally stable irrespective of the fluctuations' magnitude. In fact, if the net- 
work has no redundancy and the users are patient enough, we show that 
the long-term average of the users' trafflc flows converges to the vicinity of 
an equilibrium, and we also estimate the corresponding invariant measure. 

1. Introduction. The underlying problem of managing the flow of traffic in 
a large-scale network is as simple to state as it is challenging to resolve: given the 
rates of traffic generated by the users of the network, one is asked to identify and 
realise the most "satisfactory" distribution of traffic among the network's routes. 

Of course, given that this notion of "satisfaction" depends on the users' optimi- 
sation criteria, it would serve well to keep a concrete example in mind. Perhaps the 
most illustrative one is that of the Internet itself, where the primary concern of its 
users is to minimise the travel times of their data flows. However, since the time 
needed to traverse a link in the network increases (nonlinearly even) as the link 
becomes more congested, the users' concurrent minimisation efforts invariably lead 
to game-like interactions whose complexity precludes even the most rudimentary 
attempts at coordination. In this way, a traffic distribution will be considered "sat- 
isfactory" by a user when there is no unilateral move that he could make in order 
to further decrease the delays (or latencies) that he experiences. 

This Nash-type condition is aptly captured by Wardrop 's principle (Wardrop, 
1952): given the level of congestion caused by other users, every user seeks to employ 
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the minimum-latency path available to him. As might be expected, this principle 
has attracted a great deal of interest and it was shown early on that these Wardrop 
equilibria can be calculated by solving a convex optimisation problem (Beckmann, 
McGuire and Winsten, 1956; Dafermos and Sparrow, 1969). Among others, this 
characterisation enabled Roughgarden and Tardos (2002, 2004) to quantify the effi- 
ciency of these equilibrial states by estimating their "price of anarchy", i.e. the ratio 
between the aggregate delay of a flow at Wardrop equilibrium and the minimum 
achievable (aggregate) latency (Koutsoupias and Papadimitriou, 1999). 

Still, the size of large-scale networks makes computing these equilibria a task of 
considerable difficulty, clearly beyond the users' individual deductive capabilities. 
Moreover, a user has no incentive to actually play out his component of an equilibrial 
traffic allocation unless he is convinced that his opponents will also employ theirs 
(an argument which gains additional momentum if there are multiple equilibria) . 
It is thus more reasonable to take a less centralised approach and instead ask: is 
there a simple learning procedure which leads users to Wardrop equilibrium? 

Even though the static properties of Wardrop equilibria have been studied quite 
extensively, this question has been left relatively unexplored. In fact, it was only 
recently that the work of Sandholm (2001) showed that a good candidate for such 
a learning scheme would be the replicator dynamics of evolutionary game theory, a 
dynamical system that was first introduced by Taylor and Jonker (1978) to model 
the evolution of (nonatomic) populations that interact with one another by means 
of random matchings in a Nash game. More precisely, these dynamics arise as the 
byproduct of an "imitation of the fittest" process which drives the per capita growth 
rate of a genotype (strategy) proportionately to the difference between the repro- 
ductive fitness (payoff) of the genotype itself and the population average. Thus, 
owing to this correlation between growth rates and payoffs, the game's Nash equi- 
libria emerge as cj-limit points of the replicator trajectories - see also the excellent 
surveys by WeibuU (1995) and by Hofbauer and Sigmund (1998, 2003). 

In our congestion setting, these populations correspond to the users' traffic flows, 
so the convex optimisation formulation of Beckmann, McGuire and Winsten allows 
us to recast our problem in terms of a (nonatomic) potential game (Sandholm, 
2001). Indeed, Wardrop equilibria can be located by looking at the minimum of the 
Rosenthal potential (Rosenthal, 1973) and, hence, Sandholm's analysis shows that 
they are Lyapunov stable rest points of the replicator dynamics. This fact was also 
recognized independently by Fischer and Vocking (2004) who additionally showed 
that the (interior) solution orbits of the replicator dynamics converge to the set of 
Wardrop equilibria - actually, the authors suggest that these orbits converge to a 
point, but their analysis only holds when there is a unique equilibrium. 

Rather surprisingly, when there is not a unique equilibrium, the structure of 
the Wardrop set itself seems to have been overlooked in the above considerations. 
Specifically, it has been widely assumed that if the network's delay functions are 
strictly increasing, then there exists a unique Wardrop equilibrium (for instance, 
see Sandholm, 2001, Corollary 5.6). As a matter of fact, this uniqueness property 
is only true in irreducible networks, i.e. networks whose paths are "independent" of 
one another (in a sense made precise by Definition 2.1). In general, the Wardrop set 
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of a network is a convex polytope whose dimension is determined by the network's 
redundancy, a notion which quantifies precisely this "Hnear dependence". Nonethe- 
less, despite this added structure, we show that the expectations of Fischer and 
Vocking are vindicated in that the long-term behaviour of the replicator dynamics 
remains disarmingly simple: (almost) every replicator orbit converges to a Wardrop 
flow and not merely to the set of such flows (Theorem 3.2). 

Having said that, the imitation procedure inherent in the replicator dynamics 
implicitly presumes itself that users have perfectly accurate information at their 
disposal. Unfortunately however, this assumption is not very realistic in networks 
which exhibit wild delay fluctuations as the result of interference by random exoge- 
nous factors (commonly gathered under the collective moniker "nature"). In popu- 
lation biology, these disturbances are usually modelled by introducing "aggregate 
shocks" to the replicator dynamics (Fudenberg and Harris, 1992) and, as one would 
expect, these shocks complicate the situation considerably. For instance, Cabrales 
(2000) proved that dominated strategies become extinct in the long run, but only if 
the variance of the shocks is mild enough compared to the payoffs of the game. More 
recently, Imhof (2005) showed that even equilibrial play arises over time but, again, 
conditionally on the noise processes not being too loud (see also Benai'm, Hofbauer 
and Sandholm, 2008; Hofbauer and Imhof, 2009). On the other hand, if one inter- 
prets the replicator dynamics as the derivative of an exponential learning procedure 
and perturbs them accordingly (i.e. not as an evolutionary birth-death process), it 
was shown that similar rationality properties continue to hold, no matter how loud 
the noise becomes (Mertikopoulos and Moustakas, 2009a,b). 

All the same, these approaches have chiefly focused on Nash-type games where 
payoffs are multilinear functions over a product of simplices; for example, payoffs in 
single-population evolutionary games are determined by the bilinear form which is 
associated to the matrix of the game. This linear structure simplifies things consid- 
erably but, unfortunately, congestion models rarely adhere to it; additionally, the 
notions of Nash and Wardrop equilibrium are at variance in many occasions, a dis- 
parity which also calls for a different approach; and, finally, the way that stochastic 
fiuctuations propagate to the users' choices in a network leads to a new stochastic 
version of the replicator dynamics where the noise processes are no longer indepen- 
dent across users (different paths might share a common subset of links over which 
disturbances are strongly correlated). On that account, the effect of stochastic fluc- 
tuations in congestion models cannot be understood by simply translating previous 
work on the stochastic replicator dynamics. 

1.1. Outline. In this paper, we study the distribution of traffic in networks 
whose links are subject to constant stochastic perturbations that randomly affect 
the delays experienced by individual traffic elements. This model is presented in 
detail in Section 2, where we also develop our game-theoretic machinery: specifically, 
we introduce the notion of a network's redundancy in Section 2.2, and we examine 
its connection to Wardrop equilibria in Section 2.3. We then derive the rationality 
properties of the deterministic replicator dynamics in Section 3, where we show 
that (almost) every solution trajectory converges to a Wardrop equilibrium. 
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Section 4 is devoted to the stochastic considerations which constitute the core 
of our paper. Our first result is that strict Wardrop equihbria remain stochastically 
asymptoticaly stable irrespective of the fluctuations' magnitude (Theorem 4.3); in 
fact, if the users are "patient enough", we are able to estimate the average time it 
takes them to hit a neighbourhood of the equilibrium in question (Theorem 4.4). In 
conjunction with stochastic stability, this allows us to conclude that when a strict 
equilibrium exists, users converge to it almost surely (Corollary 4.5). On the other 
hand, given that such equilibria do not always exist, we also prove that the replicator 
dynamics in irreducible networks are recurrent (again under the assumption that 
the users are patient enough), and we use this fact to show that the long-term 
average of their traffic distributions concentrates mass in the neighbourhood of an 
interior Wardrop equilibrium (Theorem 4.6). 

1.2. Notational Conventions. If S = {sq.}^^q is a finite set, the vector space 
spanned by S over R is defined to be the set of all formal linear combinations of 
elements of S with real coefficients, i.e. the set of all functions x : § — ^ M. In tune 
with standard set-theoretic notation, we will denote this space by = Maps(§, M). 
In this way, admits a canonical basis {caj^^Q consisting of the indicator func- 
tions Ca : § — >■ M which take the value ea{sa) ~ 1 on Sa and vanish otherwise; in 
particular, if a; S has x{sa) = Xa, we will have x — XaGa- Hence, under the 
natural identiflcation i— ^ Cq,, we will make no distinction between the elements 
Sa of § and the corresponding basis vectors Cq of - in fact, to avoid drowning in 
a morass of indices, we will routinely use a to refer interchangeably to either or 
Ba, writing e.g. "a G S" instead of "sq. € S". In the same vein, we will also identify 
the set A(S) of probability measures on § with the standard n-dimensional simplex 
of K®: A(S) = {x e : i]„ Xc, = 1 and x^ > 0}. 

Concerning players and their strategies, we will follow the original convention of 
Nash (1951) and employ Latin indices {i,j, . . . ) for players while reserving Greek 
ones (a, /?...) for their (pure) strategies; also, to differentiate between strategies, 
we will use a,f3,... for indices that start at and fi,iy, . . . for those that start at 1. 
Moreover, if the players' action sets At are disjoint (as is typically the case), we will 
identify their union IJj^i with their disjoint union A = Yli-^i — Ui {(o^j*) ■ G 
Ai} by mapping a & Ai i-^ {a, i) G A. Hence, if {eia} is the natural basis of K'^' and 
{ea} is the corresponding basis of M/^ = M'^* , we will occasionally drop the index 
i altogether and write x — J^a-'-aGa G K'^ instead of a; = J^i a^i^e-ia G Hi'^'^'- 
Similarly, when it is clear from the context that we are summing over the strategy 
set Ai of player i, we will use the shorthand = X^asyi ■ 

Finally, if X{t) is some stochastic process in M" starting at X{{)) = x and there 
is no doubt that we are referring to the process X, its law will be denoted by P^. 
In that case, we will also employ the term "almost surely" instead of the somewhat 
unwieldy "P^-almost surely". 

2. Preliminaries. 

2.1. Games in Normal Form. Our starting point for the definition of a game 
in normal form will be a set of players 3\f, together with a finite measure v on 3^ 
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which "accounts" for all players i E 'N {in the sense that the singletons {i} C INf are 
all i^-measurable) . 

The players' possible actions in the game will then be represented by their strat- 
egy sets Ai, i G For our purposes, we will assume that these sets are locally 
compact Hausdorff spaces and that the relative topologies induced on n Aj 
agree for all i,j e 3\r. Thanks to this compatibility conditon, Aq = IJ^ Aj inherits 
a natural Borel structure arising from the union topology (the finest topology in 
which the inclusions A; ^ Aq are continuous) and, in this way, an admissible strat- 
egy profile X € Y[i '^i^ j^st be a measurable function x : 'IN ^ Aq which maps 
i I— e Ai for all players i d J^. For technical reasons, we will also require that 
the push-forward measure x^,v induced on Aq by x (given by x^,v{U) ~ v{x^^{U)) 
for any Borel U C Ag) be inner regular, and, hence. Radon (since v is finite). 

As is customary, we will identify two profiles which agree z/-almost everywhere, 
except when we need to focus on the strategy of a particular player « G >l against 
that of his opponents J^-i = in that case, we will use the shorthand [x-i] qi) 

to denote the profile which agrees with x on Ji-i {y-&.e.) and maps i i— > S A^. 
The set A of all such profiles x e Hi '^i^^ then be referred to as the strategy space 
of the game and is itself a Borel space because it inherits the subspace topology 
from the product A.^. 

Bearing all this in mind, the fitness of the players' strategic choices will be 
determined by their payoff functions (or utilities) : A — > M, i S in particular, 
Ui{x) = Ui(x-i] Xi) will simply represent the reward that player i g 3Nf receives in the 
strategy profile x = {x^i] Xi) G A, i.e. when he plays Xi S A^ against his opponents' 
strategy X-i G Yij^i^j- The only further assumptions that we will make is that 
these payoff functions be (Borel) measurable and that Ui{x-i; Xi) = Ui{x'_{,Xi) 
whenever x and x' agree v-'A.e. on Ji-i. 

This collection of players i g jNf, their strategy sets Ai, and their payoff functions 
Uj : A M will be our working definition for a game in normal form, usually 
denoted by © = ©(]\f, A,m). Additionally, if the payoff functions : A — > M 
happen to be continuous, the game © will be called continuous as well. 

Needless to say, this abstract definition might appear somewhat opaque, so we 
will immediately proceed with a few important examples to clarify the concept. 

2.1.1. N-person Games. As the name suggests, the players here are indexed by 
the finite set 3\f = {1, 2, . . . iV} (endowed with the usual counting measure) and the 
game's strategy space will be the finite product A = J^^ A^ (thus doing away with 
some of the technical subtleties present in the more general definition) . 

This point is where we recover the original scenario of Nash (1951). To see 
how, assume that every player i G N comes with a finite set Ai of actions (or 
pure strategies) which can be "mixed" according to some probability distribution 
Xi e A{Ai). In this interpretation, the players' strategy sets are just the simplices 
Ai = A{Ai) and their payoff functions : A = J^^ A^ — >■ M are given by the 
multilinear expectations: 

(2.1) Ui{x) = Ui{xi, . . .Xn) = X! " ' $Z ^l.ai • • • a;]V,ajvUi,ai...ajv, 
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where Xi — 'Y^^xiatia in the standard basis {eia} of M'^' and Ui^Q^ . Q^ is the 
reward that player i would obtain by choosing ai £ At against his opponents' action 
a-i G A-i = Wj^iAj. Because of this (multi)linear structure, we will commonly 
refer to Nash-type games as linear games to contrast them with more general A'^- 
person games where payoffs and strategy sets might fail to have any sort of linear 
structure - as is the case for example with concave games (Rosen, 1965). 

2.1.2. Population Games. The cornerstone of evolutionary game theory con- 
cerns games played by an uncountable number of players - for instance, see Schmei- 
dler (1973). As such, these nonatomic population games require the full breadth 
afforded by our more abstract definition. 

The first piece of additional structure encountered in these games is a measurable 
partition N = IJ^j^ of the player set jNf into N disjoint populations (or classes) 
^ accordingly, every player i e ^ belongs to a unique class l^r which we de- 
note by class(i). Each of these populations is then "measured" by the corresponding 
restriction of the measure ly on (i-e. Vr{B) — v{BC\'J^r) for any Borel i? C 3\f), 
and the basic underlying assumption is that these measures are nonatomic. 

The second fundamental assumption is that this classification of players also 
determines how they interact with their environment and with each other. More 
precisely, this means that the strategy sets of two players that belong to the same 
population coincide: — Aj whenever class(z) — class(j). Because of this, we 
will write Ar for the common strategy set of the r-th population and Aq for the 
corresponding union: Aq = IJ^^ Ar = UiGJNf 

Now. every strategy profile x : ^ A^) pushes forward a (Radon) measure x^ 
on Ar in the usual way: 

(2.2) Xr{U) = {x^Vr){U) = Vr{x-^{U)) = v{i £jir-Xi&U} 

for any Borel U C Tl^ - in other words, Xr{U) is just the measure of the play- 
ers in the r-th population whose chosen strategy lies in J7 C Ar- Then, the final 
(and perhaps most significant) requirement in population games is that the play- 
ers' payoffs depend only on the strategy distribution x — [xi, . . . xn) and not on the 
players' individual strategic choices. Specifically, if Po{A) denotes the space of all 
such strategy distributions equipped with the topology of vague convergence, we 
assume that there exist continuous functions Ur : Po{A) x Ar ^ M., r ~ 1, . . . N, 
such that: 

(2.3) Ui{x) — Ur{x]Xi) for all i G J^r- 

Consequently, as long as the overall strategy distribution x stays the same, payoffs 
remain unaffected even by positive-mass migrations of players from one strategy to 
another (and not only by migrations of measure zero) . 

Again, it would serve well to illustrate this abstract definition by means of a 
more concrete example. To wit, in evolutionary game theory, populations are usu- 
ally represented by the intervals J^r = [0, m^] where > denotes the "mass" 
of the population under Lebesgue measure. The strategy spaces Ar are typically 
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assumed to be finite, so that a strategy distribution is simply a point in the (finite- 
dimensional) product of simplices Ylr^r^i-A-r)- Hence, if player i e Jsf^ picks the 
strategy a G Ar, his payoff will be given by 

(2.4) Ura{x) = Ur{x]a), 

where, in a slight abuse of notation, we removed the hats from Ur and x in order 
to stress that they are the fundamental quantities that describe the game (it will 
always be clear from the context whether we are referring to the distribution x S 
Poi-A.) or to the actual strategy profile x : ^ Aq). 

This choice of notation is very suggestive for another reason as well: if we set 
Ar = mrA(Ar), then these simplices may be taken as the strategy sets of an 
associated A^-person game whose players are indexed by r = 1, 2 ... (that is, they 
correspond to the populations themselves). The only thing needed to complete this 
description is to define the payoff functions : A = Y[r — ^ K in this picture, 
and a natural choice would be to take the population averages: 

(2.5) Ur{x) = y XraUraix) , 

nir ^ — 'a 

where the coordinates of a; in A. However, it is worth keeping in mind that, 

depending on the situation at hand, this need not be the only reasonable choice for 
a payoff function (we will explore this issue further in the next section) . 

Potential Games. An important subclass of population games arises when the 
payoffs Ura satisfy the closedness condition: 

(2.6) TT— — — for all populations r, s and for all strategies a ^ A^ P & As- 

This condition is commonly referred to as "externality symmetry" (Sandholm, 2001) 
and it describes games where a marginal increase in the population of players using 
strategy a has the same effect on the payoffs to players playing strategy /3 as the 
converse increase. Clearly, since the strategy distributions of these games live in the 
simply connected polytope A — W^^r, condition (2.6) amounts to the existence 
of a potential function : A — > M such that: 

dF 

(2.7) Ura(x) 



Hence, if a player i E J^r makes the switch q /?, his payoff will change by: 

/ dF OF \ 

(2.8) Urfi{x) - Ura{x) = - -7. = -dF{erl3 - Bra), 

\OXrp OXra J 

where {crp} denotes the standard basis of Hr''^'^' • other words, the strategy 
migration a — > /? is profitable to a player iff the direction e^p — Bra descends the 
potential F. This property of potential games will be extremely important for our 
purposes and its ramifications underlie a large part of our work. 
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2.1.3. Nash Equilibrium and Wardrop's Principle. Under the umbrella of ratio- 
nality, selfish players will seek to play those strategies which deliver the best rewards 
against the choices of their opponents. This leads to the celebrated notion of a Nash 
equilibrium, i.e. a strategy profile q which discourages unilateral deviations: 

(NEQ) Ui{q) > Ui{q^i\ q[) for almost every i ^Ji and all strategies q[ G 

(see also Schmeidler (1973) or Milchtaich (2000)). 

The seminal result of Nash (1951) was that iV-person linear games always possess 
equilibria of this kind. Rosen (1965) subsequently extended this result to the class 
of concave games (continuous concave payoffs over convex strategy sets), while 
Schmeidler (1973) essentially settled the issue for population games with finite 
strategy sets (see also Ali Khan, 1986). 

In this last instance, Nash equilibria are aptly captured by Wardrop's principle: 

(2.9) Ura{q) > Urisiq) for all a, P E Ar s.t. q assigns positive mass to a. 

To see this, note that ii a G Ar has positive measure in the strategy distribution 
g, then there exists a player i e J^r (actually a positive mass of such players) with 
qi = a and such that (NEQ) holds. Hence, for every /3 S Ar, we immediately get: 

(2.10) Ura{q) = Ui(g-i; a) > u^{q^i; /3) = Urpiq). 

If the game in question is also a potential one, we have seen that beneficial 
migrations descend the potential function, so the minima of the potential corre- 
spond to strategy distributions where no unilateral improvement is possible. In 
fact, the Kuhn-Tucker conditions for the game's potential coincide precisely with 
the Wardrop characterisation (2.9) and, hence, the game's equilibria will be the 
critical points of the potential (Sandholm, 2001, Proposition 3.1). 

On account of the above, the equilibrium characterisation (2.9) will be central 
in our analysis, so we will examine it in depth in the sections that follow. En 
passant, we only note here that a similar condition can be laid down for population 
games with continuous strategy sets. This case has recently attracted quite a bit of 
interest, but since we will not need this added generality, we will not press the issue 
further - see instead Cressman (2005) or Hofbauer, OechsUer and Riedel (2009). 

2.2. Networks and Flows. Stated somewhat informally, our chief interest lies in 
networks whose nodes produce traffic that seeks to reach its destination as quickly 
as possible. However, since the time taken to traverse a path in a network increases 
as the network becomes congested, it is hardly an easy task to pick the "path of 
least resistance" - especially given that users compete against each other in their 
endeavours. As a result, the game-theoretic setup of the previous section turns out 
to be remarkably appropriate for the analysis of these traffic flows. 

Following Roughgarden and Tardos (2002, 2004), let 9 ee g(V, £) be a (finite) 
directed graph with node set V and edge set £, and let a = {vjw) be an origin- 
destination pair in S (i.e. an ordered pair of nodes v,w G V that can be joined by 
a path in S)- Suppose further that the origin v of a outputs traffic towards the 
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destination node w at some rate p > 0; then, the pair a together with the rate p 
will be referred to as a user of S- In this way, a network Q = Q(3Nf, yi) in 9 will 
consist of a set of users INf (indexed by z = 1, . . . N), together with an associated 
collection A = Yli-^i ^f sets of paths (or routes) Ai — {aifi,ai,i . . .} joining Vi to 
Wi (where ai — {vi,Wi) is the origin-destination pair of user i CzJ^). 

Two remarks of a book-keeping nature are now in order: first, since we will 
only be interested in users with at least a modicum of choice on how to route 
their traffic, we will take \Ai\ > 2 for all i. Secondly, we will be assuming that 
the origin-destination pairs of distinct users are themselves distinct. Fortunately, 
neither assumption is crucial: if there is only one route available to user i, the traffic 
rate pi can be considered as a constant load on the route; and if two users i,j E 
with rates Pi, Pj share the same origin-destination pair, we will replace them by a 
single user with rate pi + pj (see also Section 2.3). This means that the sets Ai 
can be assumed disjoint and, as a pleasant byproduct, the path index a E Ai fully 
characterizes the user i to whom it belongs - cf. the conventions of Section 1.2. 

So, if Xia = Xa denotes the amount of traffic that user i routes via the path 
a G Ai, the corresponding traffic flow may be represented as Xi = X^L"^*"^*"' 
where {cia} is the standard basis of the space Vi = R^'. However, for such a flow 
to be admissible, we must also have Xia > and J^aXia = Pi] hence, the set of 
admissible flows for user i will be the simplex = pi/S.[Ai) — {^Xi E Vi : Xia > 
and Yl\^ia — Pi\- Then, by collecting all these individuals flows in a single 
profile, a flow in the network Q will simply be a point x — Xi € A = Y[i A^. 

An alternative (and very useful!) description of a flow x G A can be obtained by 
looking at the traffic load that the flow induces on the edges of the network, i.e. at 
the amount of traffic yr that circulates in each edge r g £ of S- In particular: 



where yir = X^lar is the load induced on r e £ by the individual flow Xi G A. In 
this manner, a very important question that arises is whether these two descriptions 
are equivalent; put differently, whether one can recover the flow distribution x G A 
from the loads yr on the edges of the network. 

To answer this question, let {e^} be the standard basis of the space W = 
spanned by the edges £ of 9 and consider the indicator map : Vi ^ W which 
sends a path a S ^1^ to the sum of its constituent edges: P^{eia) — Tlri^a^r] 
obviously, if we set P^{eia) — J2r^ra^r, we see that the entries of will be 
^ra = 1 if G a and otherwise. We can then aggregate this construction over all 
i e [N' by considering the product space V = M'^ = J^^ Vi and the corresponding 
indicator matrix P = © • • • © P^ whose entries take the value Pra = 1 if the path 
a G A employs the edge r and vanish otherwise. By doing just that, (2.11) takes 
the simpler form yr = PraXa or, even more succinctly, y = P{x). Therefore, 
the question of whether a flow can be recovered from a load proflle can be answered 
in the positive if the indicator map P : V ^ W is injective. 

This, however, is not the end of the matter because the individual flows Xi g A^ 
actually live in the afhne subspaces pi + Zi where pi — is the barycentre 



(2.11) 




10 



p. MERTIKOPOULOS AND A. L. MOUSTAKAS 




No linearly dependent paths. 



ai,0 + "2,1 + 03,1 = Ql.l + "2,0 + "3,0 



(a) An irreducible network; red{Q) = 0. 



(b) A reducible network: red(Q) = 1. 



Fig 1 . The addition of a user may increase the redundancy of a network. 



of Ai and Zi = Tp.Ai — {zi S Vi : Zia = 0} is the tangent space to at pi - it 
is also worth keeping in mind that if we set A* — Ai \{ai.o]; then Zi = M'^* . As a 
result, what is actually of essence here is the action of P on the subspaces Zi <Vi, 
i.e. the restriction Q = P\z : 2' — >■ of P on the subspace Z = TpA = Yli Zi, 
where p = (pi, . . -Pn) is the barycentre of A. In this way, any two flows x, a;' G A 
will have z = x' ~ x (z Z and the respective loads y,y' € W will satisfy: 



so that y' — y iS x' — x E kerQ. Under this light, it becomes clear that a flow 
X E A can be recovered from the corresponding load profile y G W if and only if 
Q is injective. For this reason, the map Q : Z W will be called the redundancy 
matrix of the network Q, giving rise to: 

Definition 2.1. Let Q be a network in a graph 9 and let Q be the redundancy 
matrix of Q. The redundancy red(Q) of Q is defined to be: 



If red(Q) — 0, the network Q will be called irreducible; otherwise, Q will be called 
reducible. 

The rationale behind this terminology should be clear enough: when a network Q 
is reducible, some of its routes are "linearly dependent" and the respective directions 
in ker Q are "redundant" (in the sense that they are not refiected on the edge loads) . 
By comparison, the degrees of freedom of irreducible networks are all active and 
any statement concerning the network's edges may be translated to one concerning 
its routes. 

This dichotomy between reducible and irreducible networks will be quite sig- 
nificant for our purposes, so it is worth dwelling on Definition 2.1 for a bit more; 
specifically, it will be important to have a simple recipe with which to compute the 



(2.12) 



y' -y = Pix')-Pix)=Piz) = Q{z), 



(2.13) 



red(Q) = dini(kerQ). 
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redundancy matrix Q of a network Q. To that end, let = P^\zi be the restriction 
of on Zi and, as before, let {ei^, Ci^i, . . .} be the standard basis of Vi — M'^'. 
Then, the vectors e^^ = e^^ — Ci^Oi P G -^t = -^i \{0}, constitute a basis for Zi and 
it is easy to see that the matrix elements of in this basis will be given by: 

(2.14) Ql^ ^Pl^- /^,o- 

The above suggests that if there are too many users in a network, then it is 
highly unlikely that the network will be irreducible. Indeed, we have: 

Proposition 2.2. Let Q{yi,A) be a network in the graph 3{'\',E)and let £' C £ 
be the set of edges that are present in Q. Then: 

(2.15) red(Q) > |:N-| - |£'|. 

Hence, a network will always be reducible if the number of users exceeds the number 
of available links. 

Proof. From the definition of Q : Z ^ W we can easily see that imQ is 
contained in the subspace of W that is spanned by £'; furthermore, since every user 
has \Ai\ > 2 routes to choose from, it follows that dimZ = (l-^d ~ 1) ^ 1^1- 
Therefore: red(Q) = dim(kcrQ) = dimZ - dim(iniQ) > |K| - |£'|. □ 

2.3. Congestion Models and Equilibrium. The time spent by a traffic element 
on an edge r S £ of the graph S will be a function 4>r{yr) of the traffic load on 
the edge in question - for example, if the edge represents an M/M/1 queue with 
capacity then 4)r{yr) — ^/{Pr ~ Ur)- In tune with tradition, we will assume that 
these latency (or delay) functions are strictly increasing, and also, to keep things 
simple, that they are at least with (j)'^ > 0. 

On that account, the time needed to traverse an entire route a G Ai will be: 

(2.16) = J2 . ^^riVr) - J2 PrMVr), 

where as before: — Pr/BXp. In summary, we then have: 

Definition 2.3. A congestion model € = €{Q,(f>) in a graph S(V, £) is a net- 
work Q([N', 7l) of S equipped with a family of increasing latency functions G £• 

The similarities between this definition and that of a game in normal form should 
be evident: all that is needed to turn Definition 2.3 into a A^-person game is to spec- 
ify its payoff functions. One way to go about this is to consider the user averages: 

1 X — 1 X — ^ 

(2.17) UJi{x) = — > XiaUJiaix) = — > VtAAVr), 

Pi Pi 

where the last equality follows from (2.16) and the definition of ytr = Xjq. 
Thus, in keeping with the equilibrium condition (NEQ), a fiow q will be at Nash 
equilibrium in the game <8i = C5i(N, A, — cj) when: 

(NEl) i^iiq) < ^^i{q-i',qi) for every user i e and all fiows g- e A^. 
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For many classes of latency functions </>,., the average delays Wj turn out to be convex 
and the existence of equilibria is assured by the results of Rosen (1965). However, 
not only is this not always the case but, more importantly, the user averages (2.17) 
do not necessarily reflect the users' actual optimisation objectives either. 

Indeed, another equally justifled choice of payoffs is given by the worst delays: 

(2.18) Wi(x) = max {wi„(x)}, 

i.e. the time at which a user's last traffic packet reaches its destination. In that 
case , a flow q will be at equilibrium for the game ©2 = ©2(^7 A, —uti) when: 

(NE2) ^^i{q) < i^iiq-i', <7i) for every user i e 3\f and all flows g- e A^. 

Unfortunately, the payoff functions Wj may be discontinuous along any intersection 
of faces of because the support supp(a;i) = {a G Ai : Xia > 0} of Xi changes 
there as well. Consequently, the existence of equilibrial flows cannot be inferred 
from the general theory in this instance either. 

On the other hand, if we go back to our original motivation (the Internet), we 
see that our notion of a "user" more accurately portrays the network's routers and 
not its "real-life" users (humans, applications, etc.). However, since these routers 
are not selflsh in themselves, conditions (NEl) and (NE2) do not necessarily point 
to the right direction either. Instead, the routers' selfless task is to ensure that the 
nonatomic traffic elements circulating in the network (the actual selfish entities) 
remain satisfied. It is thus more reasonable to go back to Wardrop's principle (2.9): 

Definition 2.4. A flow g e A will be at Wardrop equilibrium when 
(WEQ) ujia{q) < ^ipiq) for all i G 3\f and for all routes a,l3 G Ai with qia > 0, 
i.e. when every nonatomic traffic element employs the fastest path available to it. 

Condition (WEQ) holds as an equality for all routes a,(^ E Ai that are employed 
in a Wardrop profile q. This gives uji(q) = LUia(q) for all a G supp(gi) and leads to 
the following alternative characterisation of Wardrop fiows: 



Even more importantly however, Wardrop equilibria can also be harvested from the 
(global) minimum of the Rosenthal potential (Rosenthal, 1973): 



The reason for calling this function a potential is twofold: firstly, it is the nonatomic 
generalisation of the potential function introduced by Monderer and Shapley (1996) 
to describe finite congestion games; secondly, the payoff functions ujia can be ob- 
tained from $ by a simple differentiation. To be sure, if we set F{x) — $(?/) where 
y = P{x), we readily obtain: 



(WEQ') 



uJi{q) < OJif){q) for alH G ?\f and for all P E Ai. 



(2.19) 




(2.20) 
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which is exactly the definition of a potential function in the sense of (2.7) - note 
also that the "externality symmetry" condition (2.6) can be verified independently: 



To describe the exact relation between Wardrop flows and the minima of <i>, 
consider the (convex) set P(A) of all load profiles y that result from admissible 
flows a; e A. Since the latency functions (p^ are increasing, $ will be strictly convex 
over -P(A) and it will thus have a unique (global) minimum y* £ ^'(A). Amaz- 
ingly enough, the Kuhn- Tucker conditions that characterise this minimum coincide 
with the Wardrop condition (2.9) (Beckmann, McGuire and Winsten, 1956; Dafer- 
mos and Sparrow, 1969; Roughgarden and Tardos, 2002; Sandholm, 2001), so the 
Wardrop set of the congestion model £ will be given by: 



Proposition 2.5. Let £ = £(Q, 0^) be a congestion model with strictly increas- 
ing latencies (j)r and let A* be its set of Wardrop equilibria. Then: 

1. any two Wardrop flows exhibit equal loads and delays. 

2. A* is a nonempty convex polytope with dim(A*) < red(Q); moreover, if there 
exists an interior equilibrium q S Int(A), then dim(A*) = red(Q). 

Since P~^{y*) is an affine subspace of M'^ and A is a product of simplices, there 
is really nothing left to prove (simply observe that if q is an interior Wardrop flow, 
then P^^{y*) intersects the full-dimensional interior of A). The only surprise here is 
that this result seems to have been overlooked in much of the literature concerning 
congestion models: for instance, both Sandholm (2001, Corollary 5.6) and Fischer 
and Vocking (2004, Propositions 2 and 3) presume that Wardrop equilibria are 
unique in networks with increasing latencies. However, if there are two distinct 
flows X, x' leading to the same load proflle y (e.g. as in the simple network of 
Fig. 1(b)), then the potential function F{x) = <i>(P(x)) is no longer strictly convex: 
it is in fact constant along every null direction of the redundancy matrix Q — P|ta- 

We thus see that a Wardrop equilibrium is unique iff a) the network Q is irre- 
ducible, or b) P~^{y*) only intersects A at a vertex. This last condition suggests 
that the vertices of A play a special role so, in analogy with Nash games, we define: 

Definition 2.6. A Wardrop equilibrium q will be called strict if a) q is pure: 
q = J2i Piei,ai, e Ai] and b) ujiaM) < ^ipil) for all paths l3 e Ai 

In Nash games, a pure equilibrium occasionally fails to be strict, but only by a 
hair: an arbitrarily small perturbation of a player's pure payoffs Ui-ai,...aN resolves a 
pure equilibrium into a strict one without affecting the payoffs of the other players. 
In congestion models however, there is no such guarantee because, whenever two 
users' paths overlap, one cannot perturb the delays of one user independently of the 
other's. As a matter of fact, the existence of a strict Wardrop equilibrium actually 
precludes the existence of any other equilibria: 




(2.22) 



A* = {x e A : P{x) = y*} = P'^y*) n A. 
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Proposition 2.7. Let € be a congestion model. If q is a strict Wardrop equi- 
librium of £, then q is the unique Wardrop equilibrium of £. 

Proof. Without loss of generality, let q — J^i Pi^ifi be a strict Wardrop equi- 
librium of £ and suppose ad absurdum that q' ^ q is another Wardrop flow. If we 
set z = q' — q G kerQ, it follows that the convex combinations q + 9z will also be 
Wardrop for all 9 G [0, 1]; moreover, for small enough 9 > 0, q + 9z employs at least 
one path /i e ^Ij \{0} that is not present in q (recall that q is pure). As a result, 
we get uJi^i,{q + 9z) = uJi,o{q + 9z) for all sufficiently small > 0, and because the 
latency functions are continuous, this yields ujifi{q) = ^^^(g). However, since 
g is a strict Wardrop equilibrium which does not employ /i, we must also have 
u)i,a{q) <LJi^{q), a contradiction. □ 

In other words, even if g is a strict equilibrium of a reducible network, then 
the redundant directions which constitute the affine subspace q + ker Q will only 
intersect A at q. On the other hand, if q is merely a pure equilibrium, q + ker Q 
might well intersect the open interior of A; in that case, there is no arbitrarily small 
perturbation of the delay functions that could make q into a strict equilibrium. 

Equilibria and Objectives. On account of the above, we will focus our investigations 
on the concept of Wardrop equilibrium. However, we should mention here that 
this equilibrial notion can also be reconciled (to some extent at least) with the 
optimisation objectives represented by the payoffs (2.17) and (2.18) as well. 

First, with respect to the average delays iOi{x) — XiaUJia{x), the optimal 

traffic distributions which minimise the aggregate delay io{x) = "Ylii Pi'^ii.'^) coin- 
cide with the Wardrop equilibria of a suitably modified game. This was first noted 
by Beckmann, McGuire and Winsten (1956), who observed the inherent duality 
in Wardrop's principle: just as Wardrop equilibria occur at the minimum of the 
Rosenthal potential, so can one obtain the minimum of the aggregate latency ui by 
looking at the Wardrop equilibria of an associated congestion model. More precisely, 
the only change that needs to be made is to consider the "marginal" latency func- 
tions (/)*(yr) = 0r(yr) + yr(t>'r{yr) (scc also Roughgardcu and Tardos, 2004). Then, 
to study these "socially optimal" flows, we simply have to redress our analysis to fit 
these "marginal latencies" instead (see Section 5 for more details). 

Secondly, Wardrop equilibria also have close ties with the Nash condition (NE2) 
which corresponds to the "worst-delays" (2.18). Specifically, one can easily see that 
the Nash condition (NE2) is equivalent to the Wardrop condition (2.9) when every 
user only has 2 possible paths to choose from (every amount of traffic diverted from 
one path increases the delay at the user's other path). However, if a user has 3 or 
more paths at his disposal, then the situation can change dramatically because of 
Braess's paradox (Braess, 1968). 

The essence of this paradox is that there exist networks which perform better 
if one removes their fastest link. An example of such a network is given in Fig. 2, 
where it is assumed that a user seeks to route 6 units of traffic from A to D using 
the three paths A ^ B ^ D (blue), A ^ C ^ D (red) a,nd A ^ B ^ C ^ D 
(green). In that case, the Wardrop condition (WEQ) calls for equidistribution: 2 
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Fig 2. Braess's paradox and the disparity between Wardrop and Nash equilibria. 



units are routed via each path, leading to a delay of 92 time units along all paths. 
Alternatively, if the user sends 3 traffic units via the red and blue paths and ignores 
the green one, all traffic will experience a delay of 83. Paradoxically, even though 
the green path has a latency of only 70, the Nash conditon (NE2) is satisfied: if 
traffic is diverted from, say, the red path to the faster green one, then the latency 
of the blue path will also increase, thus increasing the worst delay uji as well. 

This paradox is what led to the original investigations in the efficiency of selfish 
routing (Koutsoupias and Papadimitriou, 1999; Roughgarden and Tardos, 2002), 
and it seems that it is also what causes this disparity between Wardrop and Nash 
equilibria. A thorough investigation of this matter is a worthy project but, since it 
would take us too far afield, we will not pursue it here. Henceforward, we will focus 
almost exclusively on Wardrop flows, which represent the most relevant equilibrium 
concept for our purposes. 

3. Learning, Evolution and Rational Behaviour. Unfortunately, locating 
the Wardrop equilibria of a network is a rather arduous process which entails a 
good deal of global calculations (namely the minimisation of a nonlinear convex 
functional with exponentially many variables over a convex polytope). Since such 
calculations clearly exceed the deductive capabilities of individual users (especially 
if they do not have access to global information) , it is of great interest to see whether 
there are simple learning schemes which allow users to reach an equilibrium without 
having to rely on centralised computations. 

3.1. Learning and the Replicator Dynamics. For our purposes, a learning scheme 
will be a rule which trains users to route their traffic in an efficient way by process- 
ing information that is readily available. On the other hand, since this information 
must be "local" in nature, the learning scheme should be similarly "distributed": for 
example, the play of one's opponents or the exact form of the network's latency 
functions are not easily accessible pieces of information. Furthermore, we should 
also be looking for a learning scheme which is simple enough for users to apply in 
real-time, without having to perform a huge number of calculations at each instant. 

In continuous time, such a learning scheme may be cast as a dynamical system: 

(3.1) — — v(x) or, in coordinates: — — = Via(x), 

dt dt 
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where x{t) £ A denotes the flow at time t and the vector fleld w : A — > M'^ plays 
the paxt of the "learning rule" in question - for simplicity, we will also take v to be 
smooth. Of course, since the flow x{t) evolves in A, v itself must lie on the tangent 
space Z of A; we thus require that Via{x) = for all i € INT. 

Furthermore, v should also leave the faces of A invariant in the sense that any 
individual trajectory Xi(t) that begins at some face of A^ must always remain in 
said face. This is actually an essential consequence of our postulates: if a user does 
not employ a particular route a G Ai, then he has no information on the route and, 
as such, there is no a priori reason that an adaptive learning rule would induce the 
user to sample it. In effect, such a learning rule would either fail to rely solely on 
readily observable information or would not necessarily be a very simple one. 

This shows that Via{x) must vanish if Xia = 0, so if we set Via{x) = XiaVia(x), 
we obtain the orthogonality condition '^l^XiaVia{x) = 0. Accordingly, Via may be 
written in the form: 

(3.2) Via{x) = Uta{x) - Ui{x) 

where the Uia satisfy no further constraints and, as can be shown by a simple 
summation, the function Ui{x) is just the user average: Ui{x) — P^^ '^^^ Xij3Uij3{x) 

(recall that Xi/^ — pi). This shows that any learning rule which leaves the faces 
of A invariant must necessarily be of the form: 

(3.3) = Xia {u.aix) - U,{x)) . 

Dynamics of this type were flrst derived in the context of population biology by 
Taylor and Jonker (1978), initially for different genotypes within a species (single- 
population models), and then for different species altogether (multi-population 
models; WeibuU (1995) provides an excellent survey). In these evolutionary games, 
the key objects of interest are large populations of different species, each of them 
subdivided into distinct genotypes that are "programmed" to a specific behaviour 
(e.g. "hawks" fight, while "doves" take flight). Then, at each instance of biological 
interaction, it is assumed that one representative from each species is selected at 
random, and they are all matched to play some Nash game © whose payoffs repre- 
sent a proportionate increase in their reproductive fitness (measured by the number 
of offsprings in the unit of time). In this fashion, if Uia{x) denotes the population 
average of the payoff to the a-th genotype, it turns out that the evolution of the 
species will be governed by the replicator dynamics (3.3). 

In our case, the most natmal choice for the payoffs of (3.3) is to use the 
delay functions iOia{x) and set Uia — ~i^ia- In so doing, we obtain: 

(3.4) '^^^ = Xia{i^i{x) - UJia{x)) . 

dt 

In keeping with our "local information" mantra, we see that users do not need to 
know the delays along paths that they do not employ because the replicator vector 
field vanishes when Xia — 0. Thus, users that evolve according to (3.4) are oblivious 
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to their surroundings, even to the existence of other users: they simply use (3.4) to 
respond to the stimuli u}ia{x) in the hope of minimising their delays. 

Alternatively, if players learn at different rates Ai > as a result of varied 
stimulus-response characteristics, we obtain the rate-adjusted dynamics: 



(3.5) 



dxj 



dt 



XiXia (Wi(a;) - LOia{x)) 



(naturally, the uniform case (3.4) is recovered when all players learn at the "stan- 
dard" rate = 1). Interestingly enough, these learning rates can also be viewed as 
(player-specific) inverse temperatures: in high temperatures (small A^), the differ- 
ences between routes are toned down and players evolve along the slow time-scales 
Xit] at the other end of the spectrum, if A^ — >■ cxd, equation (3.5) "freezes" to a rigid 
(and myopic) best-reply process (see also Borgers and Sarin, 1997). 

3.2. Entropy and Rationality. An immediate observation concerning the repli- 
cator dynamics (3.5) is that Wardrop equilibria are rest points: if g is a Wardrop 
flow, the characterisation (WEQ') gives uJia{q) = i^iiq) whenever Xjq, > 0. How- 
ever, the same holds for all flows q' which exhibit equal latencies along the paths 
in their support, and these flows are not necessarily Wardrop (in the terminology 
of Sandholm (2001), this means that the replicator dynamics are "complacent"). 
Consequently, the issue at hand is whether or not the replicator dynamics manage 
to single out Wardrop equilibria among other stationary states. 

In that direction, if y* is the minimum of the Rosenthal potential <&(?/), it is 
easy to see that the function Fq{x) — ^{P{x)) — $(?/*) is a semi- definite Lyapunov 
function for the dynamics (3.5). Indeed, Fq vanishes on the Wardrop set A*, is 
positive otherwise, and its evolution under (3.5) satisfies: 



(3.6) 



dFo 
dt 



XiaUjf^ix) 



<o, 



the last step following from Jensen's inequality - equality only holds when ujia{x) — 
U)i{x) for all a e supp(a;). Thus, by standard results in the theory of dynamical 
systems, it follows that the solution orbits of (3.5) descend the potential Fq and 
eventually converge to a connected subset of rest points - see also Sandholm (2001), 
where the property (3.6) is referred to as "positive correlation". 

Nevertheless, since not all stationary points of (3.5) are Wardrop equilibria, this 
result tells us little about the rationality properties of the replicator dynamics in 
congestion models. A much more important role is played by the relative entropy 
(also known as the Kullback-Leibler divergence): 



(3.7) 



Hq{x) = dKhiq-.x) ^ ^ qialog 

QSsupp(g) 



qia 
Xia 



where the sum is taken over the support of q: supp(g) = {a : qia > 0}. Of course, 
this sum is finite only when x employs with positive probability all a G ^1 that are 
present in q; that is, the domain of definition of iJ, is = {x G A : q <C x} (here. 
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denotes absolute continuity of measures). Even so though, it will matter little 
if we extend Hq continuously to all of A by setting Hq = oo outside A^, so we will 
occasionally act as if Hq were defined over all of A. 

Technicalities aside, the significance of the relative entropy lies in that it measures 
distance in probability space. Indeed, even though it is not a distance function per 
se (it fails to be symmetric and does not satisfy the triangle inequality) , it is positive 
definite (see below) and strictly convex (WeibuU, 1995). More importantly for our 
purposes, it is also a (semi-definite) Lyapunov function for the dynamics (3.4): 

Lemma 3.1. Let £(0,(0^}) be a congestion model with increasing latencies, 
and let q (z A be a Wardrop flow of <L. Then, the relative entropy Hq{x) satisfies: 

1. Hq{q) = and Hq{x) > for all x ^ q; 

2. Hq vanishes on the Wardrop set A*(£) and is negative otherwise (where Hq 
denotes the time derivative with respect to (3.4): Hq ~ ^ Xia)- 

In particular, if the network Q is irreducible (rcd(Q) ^0), then Hq is Lyapunov for 
the replicator dynamics (3.4)- 

Proof. The first part of the lemma (positive-definiteness) is an easy conse- 
quence of Jensen's inequality (Weibull, 1995, pp. 95-100). As for the second part: 

(3.8) Hq{x)='^_ -Tp-^ Ajg = - . qia{l^i{x) - UJia{x)) = -Lq{x), 

where we have set Lq{x) = qia{uJi{x) —u!ia{x)). Then, the simple rearrangement 
X^l <lia^i{x) = piUJi(x) — XiaijJia{x) and somc trivial linear algebra yield: 

(3.9) Lq{x) = [xia - qia)uJta{x) = (y^ ~ yr)'f>r{yr) = Hv) , 

z — ^t^cx ^ — 

where y — P{x) and y* — P{q) are the load profiles which correspond to the fiows 
X and q respectively. 

We will refer to the expression Lq (or, interchangeably, to A) as the adjoint poten- 
tial of £ because, similarly to the Rosenthal potential <I>, it measures distance from 
the Wardrop set A*. The properties of Lq will be discussed at length in Appendix 
A where, among others, we establisth the easy (but crucial!) inequality: 

(3.10) A(y)>$(y)-$(y*). 

Hence, with y* being the global minimum of <&, we conclude that A is positive 
definite and the lemma follows by noting that P{x) ~ y* iE x is Wardrop. □ 

Remark 1 {The Rate-adjusted Case). It is also reasonable to ask whether the 
relative entropy function enjoys the same properties in the rate-adjusted dynamics 
(3.5). Unfortunately, this is not true unless all players learn at the same rate; 
however, if we consider the rate-adjusted relative entropy: 

(3.11) iI,(x;A) = ^A-i ^ g,„log^, 
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the same calculations show that Hq{x:X) — ~Lg{x) and provide us with the ana- 
logue of Lemma 3.1 for the rate-adjusted dynamics (3.5). 

In view of the above, it would be tempting to infer that the replicator dynamics 
converge to Wardrop equilibrium. Nevertheless, a semi-definite Lyapunov function 
is not enough to guarantee convergence by itself, even if we rule out the existence 
of limit cycles. For instance, if we consider the homogeneous system: 

(3.12) X — yz, y — —xz, z — — z^, 

with z > 0, we see that it admits the semi-definite Lyapunov function H{x, y, z) — 
x'^ + y'^ + z^ whose time derivative only vanishes on the x-y plane. However, the 
general solution of (3.12) in cylindrical coordinates (p, 0, z) is just: 

(3.13) p(<)=Po, (/<(i) =0o-log(l + zot), 2(0 = T-T^' 

1 + ZqX 

and this represents a helix of constant radius whose coils become topologically 
dense as the solution orbits approach the x-y plane. We thus see that the solutions 
of (3.12) approach a set of stationary points, but do not converge to a specific one. 

That said, there is much more at work in the replicator dynamics (3.5) than a 
single semi-definite Lyapunov function: there exists a whole family of such functions, 
one for each Wardrop fiow g e A. So, undettered by potential pathologies, the 
replicator dynamics actually do converge to equilibrium: 

Theorem 3.2. Let £(Q, {<^r}) be a congestion model in a network Q. Then, 
every interior solution trajectory of the replicator dynamics (3.5) converges to a 
Wardrop equilibrium of £; in particular, if the network Q is irreducible, x{t) con- 
verges to the unique Wardrop equilibrium of £. 

Proof. It will be useful to shift our point of view to the evolution function 
9{x,t) of the dynamics (3.5) which describes the solution trajectory that starts at 
X at time t = and which satisfies the consistency condition: 

(3.14) e{x,t-\-s) = e{9{x,t),s) for all t,s>0 and for all x e A. 

Now, fix the initial condition a; G Int(A) and let x{t) — 9{x, t) be the corresponding 
solution orbit. If q S A* is a Wardrop equilibrium of £, then, in view of Lemma 
3.1, the function Vq{t) = Hg{9{x,t)) will be decreasing and will converge to some 
TO>Oast— >-cx). It thus follows that x{t) converges itself to the level set H~^{m). 

Suppose now that there exists some increasing sequence of times t„ — >■ co such 
that Xn = x{tn) does not converge to A*. By compactness of A (and by descending 
to a subsequence if necessary), we may assume that a;„ = 9{x, t„) converges to some 
X* ^ A* (but necessarily in H~^{m)). Hence, for any t > 0: 

(3.15) Hg{9ix,t,,+t)) = Hgi9{9{x,t„),t)) ^ Hg{9ix* A)) < Hg{x*) = m 

where the (strict) inequality stems from the fact that Hg < outside A*. On the 
other hand, Hq{9{x, tn + t)) = Vq{tn + t) — > m, a contradiction. 
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Fig 3. The various sets in the proof of Theorem 3.2. 



Since the sequence tn was arbitrary, this shows x{t) converges to the set A*. So, 
let q' be a limit point of x{t) with x{t'^) — > q' for some sequence of times t'^ — > oo. 
Then, Vqi{t'^) — Hq/{x{t'^)) will converge to zero and, with Vq' decreasing, we will 
have limt_j.oo Vq'it) = as well. Seeing as Hq' only vanishes at q' , we conclude that 
x{t) -J> q'. □ 



Remark 1 {Previous Work). In the context of potential games, Sandholm 
(2001) examined a class of learning dynamics Via which are "positively correlated" 
to the game's payoff functions Uia — —i^ia, in the sense that ^ Via{x)ujia{x) > 0. 
It was then shown that if the rest points of these dynamics coincide with the game's 
Wardrop equilibria (the "non-complacency" condition), then all solution orbits con- 
verge to set of Wardrop equilibria. Unfortunately, as we have already pointed out, 
the replicator dynamics are "complacent" and, in that case, Sandholm's results only 
ensure that Wardrop equilibria are Lyapunov stable. 

To the best of our knowledge, the stronger convergence properties of Theorem 
3.2 were first suggested by Fischer and Vocking (2004) who identified the link be- 
tween Wardrop equilibrium and evolutionary stability (Maynard Smith, 1974). In 
particular, the authors showed that Wardrop equilibria are robust against "muta- 
tions" that lead to greater delays but, in networks with more than one users (the 
"multi-commodity" case as they call it), their approach rests heavily on the (im- 
plicit) assumption of irreducibility. If this is not the case, the adjoint potential Lq 
is only positive semi-definite and the approach of Fischer and Vocking breaks down 
because Wardrop equilibria are only neutrally stable - this is also the problem with 
the formulation of Corrolary 5.1 in Sandholm (2001). 

Remark 2 (Non-interior Trajectories). One might also ask what happens if 
the initial condition a;(0) is not an interior point of A. Clearly, if a;(0) does not 
employ the routes that are present in a Wardrop fiow q, x{t) cannot have q as a 
limit point - a simple consequence of the fact that the replicator dynamics leave 
the faces of A invariant. All the same, one can simply quotient out the routes that 
are not initially present until a;(0) becomes an interior point in the reduced strategy 
space Aoff that ensues. In that case. Theorem 3.2 can be applied to the (similarly 
reduced) congestion model £eff to show that x{t) converges to Wardrop equilibrium 
in £eff (cf. the "restricted equihbria" of Fischer and Vocking, 2004). 
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Remark 3 (Evolution and Friction). Since the replicator trajectories converge 
to a Wardrop equilibrium, it follows that there can be no limit cycles. On the 
other hand, limit cycles are a common occurence in evolutionary games: for ex- 
ample, Mathcing Pennies and Rock-Paper-Scissors both exhibit limit cycles in the 
standard replicator dynamics (WeibuU, 1995). So, while the evolutionary energy 
of large populations may remain undiminished over time. Theorem 3.2 shows that 
congestion models are dissipative and traffic flows settle down to a steady state. 

4. The Effect of Stochastic Fluctuations. Going back to our original dis- 
cussion on learning schemes, we see that the users' evolution hinges on the feedback 
that they receive about their choices, namely the delays ujia{x) that they record. 
We have already noted that this information is based on actual observations, but 
this does not necessarily mean that it is also accurate as well. For instance, the 
interference of nature with the game or imperfect readings of one's payoffs might 
perturb this information considerably; additionally, if the users' traffic ffows are 
not continuous in time but consist of discrete segments instead (e.g. datagrams 
in communication networks), the queueing latencies ujia only represent the users' 
expected delays. Hence, the delays that users actually observe might only be a 
randomly ffuctuating estimate of the underlying payoffs, and this could negatively 
affect the rationality properties of the replicator dynamics. 

4.1. Stochastic Replicator Dynamics. Our goal here will be to determine the 
behaviour of the replicator dynamics under stochastic perturbations of the kind 
outlined above. To that end, write the delay that users experience along the edge 
r Cz £- as — (f>r + Vr where rjr denotes the perturbation process. Then, the 
latency ujia along a € Ai will just be uJia — (^Jia + where, in obvious notation, 
Via = Tlia^ra'nr- In this way, the replicator dynamics (3.4) become: 

dx ' 

(4.1) = Xia {<^i - <^ia) = X^a {l^i - UJ^a) + Xia{rji - 77^^) 

where uj^ = Xipdiiij and rj^ = Xiprjip. 

The exact form of the perturbations rjr clearly depends on the particular situation 
at hand. Still, since we are chieffy interested in stochastic fluctuations around the 
underlying delays it is reasonable to take these perturbations to be some sort 
of white noise that does not bias users towards one direction or another. In that 
case, we should rewrite (4.1) as a stochastic differential equation: 



(4.2) 



dXia — Xia [uii{X) 



- LOi, 



{X)] dt + dU.o. - Pi^ ^ X,p dU,p 



where dUia describes the total noise along the path a E Ai. 



(4.3) 




and W{i) = J^r Wr{t)er is a Wiener process in E^, the space spanned by the edges 
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£ of the network. Similarly, if players learn at different rates A^, we get: 



(4.4) dX,a = XtXic, [UJ^{X) - UJia{X)] + \,Xic dU^a " P, ^ ^^^/^ 



where h and c are the drift and diffusion coefficients that appear in (4.2). 

The rate-adjusted equation (4.4) will constitute our stochastic version of the 
replicator dynamics and, as such, it warrants some discussion in and by itself. A 
first remark to be made concerns the noise coefficients Ur'. even though we have 
written them in a form that suggests they are constant, they need not be so: after 
all, the intensity of the noise on an edge might well depend on the edge loads 
Yr — 'Y^^Pra^a- On that account, we will only assume that these coefficients 
are essentially bounded functions of the loads y. Nonetheless, in an effort to reduce 
notational clutter, we will not indicate this dependence explicitly, instead, we simply 
remark here that our results continue to hold if we replace cr^ with the worst-case 
scenario Ur ^ esssup^ (Jriy)- 

Secondly, it is also important to compare (4.4) to other stochastic incarnations 
of the replicator dynamics, namely the "aggregate shocks" version of Fudenberg and 
Harris (1992) and the authors' own "exponential learning" approach (Mertikopou- 
los and Moustakas, 2009a, b). In the case of the former, one perturbs the replicator 
equation (3.3) by accounting for the (stochastic) interference of nature with repro- 
duction rates (Fudenberg and Harris, 1992; Imhof, 2005): 



where W — J2i a^ia^ia is a Wiener process in rii''^'^'- Then, if the "aggregate 
shocks" aia 'Are mild enough, Cabrales (2000) and Imhof (2005) showed that dom- 
inated strategies become extinct and that the game's strict Nash equilibria are 
asymptotically stable with arbitrarily high probability. 

By comparison, in the "exponential learning" case it is assumed that the players 
of a Nash game employ a learning scheme akin to logistic fictitious play (Fudenberg 
and Levine, 1998, pp. 118-129). However, if the information that players have is 
imperfect, the errors propagate to their learning curves and instead lead to the 
stochastic dynamics: 



(4.5) dX,^ = X,^ MX) - u,{X)) dt - a^X^o. - ^ cr^pX,^ dt 




(4.6) dXia = KXia [{Uia{X) - Ui{X))] dt + \iXia a^adWia ~ ^ 

+ Y^ia ^ 2Xia) - alpXip{\ - 2Xip) dt. 



aifjXipdWijj 



The rationality properties of these learning dynamics are somewhat stronger than 
in the biological setting: irrespective of the perturbations' magnitude, strategies 
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which are not rationally admissible die out at an exponential rate and the strict 
Nash equilibria of the game are always (stochastically) stable (Mertikopoulos and 
Moustakas, 2009a,b). 

In light of the above, there are two notable traits of (4.4) that set it apart 
from its other stochastic versions. First off, the drift of (4.4) coincides with the 
deterministic replicator dynamics (3.5) whereas the drift coefficients of (4.5) and 

(4.6) do not. On the other hand, the martingale processes U that appear in (4.4) 
are not uncorrelated components of some Wiener process (as is the case for both 
(4.5) and (4.6)): instead, depending on whether the paths a,f3 G A have edges in 
common or not, the processes UayUp might be highly correlated or not at all. 

To make this last observation more precise, recall that the Wiener differentials 
dWr are orthogonal: dWr ■ dWs = d[Wr, Wg] = Srs dt. In its turn, this implies that 
the stochastic differentials dUa,dUi3 satisfy: 

(4.7) dUa ■dUf!=(^ ^^"^^ '^^^) ■ ( '^^') 

EPraPspCTrasSrs dt ^ (T^ dt = dt, 

where cr^^ — PraPrpo'r gives the variance of the noise along the intersection 
a/3 = an/3 of the paths a, (3 £ A (note also that we used our notational conventions 
to avoid cumbersome expressions such as cr^ctj^)- ^'^^^ ^'^^^ ^^'^ processes 
Ua and are uncorrelated iff the paths a,l3 G A have no common edges. At the 
other extreme, we have: 

(4.8) {dUaf = V aldt = al dt 

where cr^ = a"^^ — J^,- Pra<^r measures the intensity of the noise on the route 
a E A.^ These expressions will be key to our analysis and we will make liberal use 
of them in the rest of our paper. 

4.2. Stochastic Fluctuations and Rationality. Our goal in this section will be 
to explore the rationality properties of the stochastic replicator dynamics (4.4). 
To begin with, note that (4.4) admits a (unique) strong solution for any initial 
state X(0) ~ X E A, even though its coefficients do not necessarily grow linearly 
- a common requisite for existence and uniqueness of strong solutions to SDE's. 
Indeed, an addition over a E Ai reveals that every component simplex of A 
remains invariant under these dynamics: if Xi(0) = Xi E A^, then rf(X)a"^ja) ~ 
and, hence, Xiit) stays in A^ for all t > 0. So, if f/ D A is open and (/) is a smooth 
bump function on t/ that vanishes outside some compact set K ^U, the SDE 

(4.9) dX,^ = Kcj,{x) {b,^{X) dt + c,,„^(X) dU,p^ 

has bounded coefficients and will thus admit a unique strong solution. But since 
this last equation agrees with (4.4) on A and all solutions of (4.4) always stay in 
A, our claim follows. 

^This notation is also consistent with our intersection notation: aa = a f] a = a. 
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Now, as in the deterministic setting, our main tool will be the (rate-adjusted) 
relative entropy Hq{x; A) = A^^ qia log (qia/xia) which we will study with 
the help of the generator L of the diffusion (4.4). To that end, recall that the 
generator L of the Ito diffusion: 

(4.10) dx^{t)^^I^ix{t))dt + Y,^^c.p{x{t))dWp{t), 

where is a Wiener process, is just the second order differential operator: 

(4.11) L = E„ ..(.)^ + 1 E.,, H^h'i^))., ^ 



/3 



(for a comprehensive account, consult the excellent book by 0ksendal (2007)). In 
this manner, if / is a sufficiently smooth function, Lf captures the drift of the 
process f{X{t)): 



(4.12) df{X{t)) = Lf{X{t)) dt + E„^^ ^ 



a^p{X{t))dWp{t). 
x(t) 



Of course, in the case of the diffusion (4.4), the martingales U are not the compo- 
nents of a Wiener process, so (4.12) cannot be applied right off the shelf. However, 
a straightforward application of Ito's lemma (see appendix B) yields: 

Lemma 4.1. Let L he the generator of (4-4)- Then, for any q G A; 

(4.13) LHg{x;X) = -Lg{x) + -'^-^'^ (t}^{x^p - q^ij){x^^ - q^^) 

where Lq{x) — X^ial^^a ~ <lia)^ia{x) IS the adjoint potential of (3.9). 

In a certain sense, this lemma can be viewed as the stochastic analogue of Lemma 
3.1 (which is recovered immediately if we set a = 0). However, it also shows that the 
stochastic situation is much more intricate than the deterministic one. For example, 
if g is a Wardrop equilibrium, (4.13) gives: 

(4.14) ^-^9(9;^) = o E m{P'i^M-1n)^M^ 
and if we focus on user i G we readily obtain: 

(4.15) E^ ^ HP (P^^Pi - = ^ HpiPzSpt ~ qt^) K^K-^^l 

= ^^'^rPiVtr - E^ '^rVir = E^ ^IViAp^ " Vir), 

where = ^ralia < Pi is the load induced on edge r G £ by the j-th user. 
This shows that (4.14) is positive if at least one user mixes his routes, thus ruling 
out negative definiteness (even semi-definiteness) for LHq. 
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In view of the above, unconditional convergence to Wardrop equilibrium appears 
to be a "bridge too far" in our stochastic environment, especially when the equilib- 
rium in question is not pure - after all, mixed equilibria are not even traps of (4.4). 
This leads us to the notion of stochastic stability: 

Definition 4.2 (Arnold, 1974; Gikhman and Skorokhod, 1971). We will say 
that q S is stochastically asymptotically stable with respect to the process X(t) 
when, for every neighbourhood U of q and every e > 0, there exists a neighbourhood 
V of q such that: 

(4.16) lx{t) e U for alH > and lim X{t) =q] >l-e. 

for all initial conditions X{Q) = x E V. 

As we mentioned before, the notion of stochastic stability features prominently 
in the analysis of stochastically perturbed evolutionary or Nash-type games because 
it is precisely the type of stability that the strict equilibria of these games exhibit 
(Imhof, 2005; Mcrtikopoulos and Moustakas, 2009a). Motivated by these results, we 
are finally in a position to state our analogue of the folk theorem for stochastically 
perturbed congestion models: 

Theorem 4.3. Strict Wardrop equilibria are stochastically asymptotically stable 
in the replicator dynamics (4-4)- 

Proof. By relabeling indices if necessary, assume that q — J^i Pi^ifi is the strict 
Wardrop equilibrium of £. Inspired by the deterministic setting and the original 
idea of Imhof (2005), we will show that Hq is a local stochastic Lyapunov function, 
i.e. that LHq{x) < —kHq{x) for some A: > and for all x sufficiently close to q. 
Our result will then follow from Theorem 4 in Gikhman and Skorokhod (1971, pp. 
314-315). 

To that end, consider a perturbed fiow x = ^ Xiaqia close to q: 

(4.17) Xi^Q^ pi{l~ £i), Xif, ^ Eiii^, for ^1 ^ 1,2 ... e Ai\{0} 

where > controls the distance between Xi and g,, and is a point in the 
face of Aj lying opposite to q (i.e. > and X^^^v ~ Pi)- Then, in view of 
Lemma A.l in Appendix A, the adjoint potential Lq will be bounded below by: 

(4.18) Lq{x) > y p^e,AuJi, 

where Auji = minp{wi^(q) — cjifi^q)} > (recall that g is a strict Wardrop equi- 
librium). Therefore, since the second term of LHq{x; A) in (4.13) is clearly of order 
O(e^), we obtain: 



(4.19) 



LHq{x; A) < - y p,e,Aw, + 0{e^) 
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where = J^i^l- On the other hand, we also have: 

(4.20) i7,(x; A) = ^ . f log ^ = - 5] . ^ log(l _ e,) = ^ . f + ©(e^). 

Thus, if we pick some positive k < minj{A,;Ati;i}, some elementary algebra gives: 

(4.21) LHq{x- A) < -fc^. ^e, + 0{e^) = -kHq{x- A) + 0{e^), 

thus showing that (4.13) holds whenever e is small enough. □ 

In other words. Theorem 4.3 implies that trajectories which start sufficiently 
close to a strict equilibrium will remain in its vicinity and will eventually converge 
to it with arbitrarily high probability. Nonetheless, this is a local result: if the users' 
initial traffic distribution is not close to a strict equilibrium itself. Theorem 4.3 does 
not apply; specifically, if X(0) is an arbitrary initial condition in A, we cannot even 
tell if the trajectory X{t) will ever approach q. 

To put this in more precise terms, it will be convenient to measure distances in 
A with the L-'^-norm: ||X]q -^^aeQllj^ = l-^^l- In this norm, it is not too hard to 
see that A has a diameter of 2^^^^, so pick some positive 5 < and let 

Kg — {x £ A : \\x — q\\i < 5} be the corresponding compact neighbourhood of q. 
Then, to see if X{t) ever hits Kg, we will examine the hitting time rg: 

(4.22) Tg = TK, = inf{< > : X{t) e Kg} = mi{t > : \\X{t) - q\\i < S}. 

Thereby, our chief concern is this: is the hitting time Tg finite with high probability? 
And if it is, is its expected value also finite? 

To make our lives easier, let us consider the collective expressions: 

(4.23) /» = Z!/^' Acj = p-i^^p,Aw, 
where Acj^ — min^^ct, 

.{ijji^{q) - Wia.(g)} > is the minimum delay difference 
between a user's equilibrium path ai and his other choices. We then have: 

Theorem 4.4. Let q = PiSiai be a strict Wardrop equilibrium of a conges- 
tion model £ and assume that the users' learning rates satisfy the condition: 

(4.24) Acr^ < Aw. 

Then, for any S < 2p and any initial condition X{0) — x (z A. with finite relative 
entropy Hq{x\X) < oo, the hitting time Tg has finite mean: 



(4.25) 
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Proof. As in the case of Theorem 4.3, our proof hinges on the expression: 

(4.26) -£iJ,(a;;A) = L^(a;) - - V. — V aj^ix.^ - q,^){x,^ - q,^), 

where q — Pi&i^Q is the strict equihbriuni in question. In particular, set x — q+9z, 
where z = ^ Zia^ia G TqA is the "inward" direction: 

(4.27) z,,o = -ft, > for ^ = 1, 2 . . . e A* = Ai \{0} and = P*- 
Then, regarding the first term of (4.26), Lemma A.l in Appendix A readily yields: 

(4.28) Lgiq + 6z) > $(q + 0z) - $(g) > 6* V p^Aw, for all 9 G [0, 1]. 
In a similar vein, the second term of (4.26) becomes: 

(4.29) ^cTp^ixzp - qrp)ixij - q^y) = ^ (^l^ZipZ,^ 

where Wi = P^{z). Since Wir < Pi for all r G £, we then obtain the inequality: 

(4.30) - ZHq{x- \)>0J2^ ft Al., - Kp^ = {p^^)e - y pAa2 

where p, cr^ and A, Aw are the respective aggregates and averages of (4.23). 

Suppose now that the rates A^ satisfy (4.24), i.e. Acr^ < Aw. In that case, the 
RHS of (4.30) will be increasing for all 9 € [0, 1] and we will have: 



(4.31) -LHg{x;X)>p 



9Auj ~ -Xa^9^ 
2 



> 



for all x with \\x — q\\i > 9\\z\\i = 29J2i Pi = 2/?^?. So, if \\x — q\\i > S, we get: 

(4.32) -ZU,(x-\)>-Au:--—^a''p\>-l\u:[\--\^ > 0. 

Therefore, if Kf, is the compact neighbourhood Ks = {a; e A : ||a; — qjl i < 5}, we 
will have LBqix) < -^Au (l~ < for all x ^ Kg. Then, by a simple (but 
very useful!) estimate of Durrett (1996, Theorem 5.3 in page 268) we get: 

Recall now that Theorem 4.3 ensures that a trajectory X{t) which starts suf- 
ficiently close to a strict equilibrium q will converge to q with arbitrarily high 
probability. Therefore, since Theorem 4.4 shows that X{t) will come arbitrarily 
close to q in finite time, a tandem application of these two theorems yields: 
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Corollary 4.5. If q is a strict equilibrium of€ and the players' learning rates 
Xi satisfy (4-24), the trajectories X{t) converge to q almost surely. 

Of course, if a strict Wardrop equilibrium exists, then it is the unique equihbriuni 
of the game (Proposition 2.7). In that case. Corollary 4.5 is the stochastic coun- 
terpart of Theorem 3.2: if the players' learning rates are soft enough compared to 
the level of the noise, then the solution orbits of the stochastic replicator dynamics 
(4.2) converge to a stationary traffic distribution almost surely. A few remarks are 
thus in order: 



Remark 1 (Temperance and Temperature). Condition (4.24) shows that the 
replicator dynamics reward patience: players who take their time in learning the 
game manage to weed out the noise and eventually converge to equilibrium. This 
begs to be compared with the (inverse) temperature analogy for the learning rates: if 
the "learning temperature" T = 1/A is too low, the players' learning scheme becomes 
very rigid and this intemperance amplifies any random variations in the experienced 
delays. On the other hand, when the temperature rises above the threshold = 
(T^/Ao;, the stochastic fluctuations are toned down and the deterministic drift draws 
users to equilibrium. 

Remark 2. Admittedly, the form of (4.25) is a bit opaque for practical pur- 
poses. To lighten it up, note that we are only interested in small 5, so the term 
2p/{2p — S) may be ignored to leading order. Therefore, if we also assume for sim- 
plicity that all players learn at the same rate = A, we get: 

(4.34) E,[r,]< ' 



XAuj S 

where h — p^^ J2i Pi^'^&iPi/^i) the "average" KuUback-Leibler distance between 
X and q. 

This very rough estimate is pretty illuminating on its own. First and foremost, 
it shows that our bound for Ea;[T5] is inversely proportional to the learning rate 
A, much the same as in the deterministic setting where A essentially rescales time 
to Xt. Moreover, because p and S are both 0{N), one might be tempted to think 
that our time estimates are intensive, i.e. independent of N. However, since delays 
increase (nonlinearly even) with the aggregate load p, the dependence on N is 
actually hidden in Aw - this also shows that the learning rates Xi do not have to 
be (l/|£|)-small in order to satisfy (4.24). 

In any event, since strict equilibria do not always exist, we should return to 
the generic case of interior equilibria q G Int(A). We have already seen that these 
equilibria are not very well-behaved in stochastic environments: they are not sta- 
tionary in (4.4) and (4.15) shows that LHq is actually positive in their vicinity. 
Despite all that, if the network Q is irreducible and the users' learning rates A^ are 
slow enough, we will see that the replicator dynamics (4-4) admit a finite invariant 
measure which concentrates mass around the (unique) equilibrium of C 
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To state this result precisely, a little more groundwork is required. First off, akin 
to the case of strict equilibria, it will be convenient to measure distances from q with 
a scaled variant of the norm. In particular, let Sq = {z G TgA : q + z G bd(A)}. 
Since A is convex, any x € A can be uniquely expressed as x ^ q + 9z for some 
z Cz Sq and some 6 G [0, 1], so we define the projective distance &q{x) of x from q 
to be: 

(4.35) ©9(2;) ^ d X = q + 0z for some z G Sq and < 6* < 1. 

Qq is not a bona fide distance function by itself, but it closely resembles the 
norm: the "projective balls" Bg = {x : Qqix) < 9} are rescaled copies of A {Sq is the 
"unit sphere" in this picture), and the graph gr(6g) = {{x, 9) & AxR : Qq{x) = 9} 
of Qq is simply a cone over the polytope A. 

In a similar vein, we define the essence of a point g G A to be: 

(4.36) ess(g) = p-imin{||P(z)|| : z G Sq}, 

where || • || denotes the ordinary Euclidean norm and the factor of p was included 
for scaling purposes. Comparably to red(Q), ess{q) measures redundancy (or rather, 
the lack thereof): ess(g) = only if some direction z £ Sq is null for P, i.e. only if 
Q is reducible.^ 

We are finally in a position to state and prove: 

Theorem 4.6. Let q G Int(A) be an interior equilibrium of an irreducible con- 
gestion model £, and assume that the users' learning rates satisfy the condition: 



4 m, pK^ 

(4.37) A < ^ — , where m = inf{(/) (y^) : f G £,?/ G P{A)\ and k ~ ess(q). 

5 CT^ 



Then, for any interior initial condition X{0) = 2; G Int(A), the trajectories X{t) 
are recurrent (a.s.) and their time averages are concentrated in a neighbourhood of 
q. Specifically, ifQq(-) denotes the projective distance (4-35) from q, then: 



(4.38) E, 



1 /•* n 1 / 2 ^ ^1 

1 / NN , , „9 ,„ ,.N , „9 1 / mpK 







e'qix{s))ds 



<9l + 0{l/t), where9l = -r^-l 



Accordingly, the transition probabilities of X(t) converge in total variation to an 
invariant probability measure tt on A which concentrates mass around q. In partic- 
ular, if Bg = {.T G A : Qq{x) < 9} is a "projective ball" around q, we have: 

(4.39) 7r{Bg)>l-9l/9\ 



Following Bhattacharya (1978), recurrence here means that for every f G Int(A) 
and every neighbourhood of the diffusion X(t) has the property: 

(4.40) P,{X{tk)eU^} = l, 

^With some trivial modifications, these definitions can be extended to points in bd{A) as well. 
In particular, if q is strict we get Qq^{xi) = \\zi\\i/2pi and, as a consequence of Proposition 2.7, 
we will have ess(g) > irrespective of whether Q is reducible or not. 
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Traffic flow of User I along path [ 1 ,0] Traffic flow of User I along path [ 1 ,0] 



(a) Learning in an irreducible network. (b) Learning in a reducible network. 

Fig 4. Learning in the networks of Figure 1 with M/M/1 latency functions (priVr) = (Mr — yr)~^ 
and arbitrarily chosen capacities fir ■ The shades of gray represent the invariant distribution of 
(4-4) (obtained by numerical simulations) and the flow lines (blue) are the solution trajectories 
of (3.5) - in (b) they are actually projections because there is a third user as well. 

for some sequence of (random) times that increases to infinity. Hence, using the 
recurrence criteria of Bhattacharya (1978), we will prove our claim by showing that 
X{t) hits a compact neighbourhood of q in finite time (this is the hard part), and 
that the generator of a suitably transformed process is elliptic. 

Proof of Theorem 4.6. As we mentioned before, any a; G A may be (uniquely) 
written in the "projective" form x = q + 9z, where 6 — Qq{x) E [0, 1] is the projec- 
tive distance of x from q, and z is a point in the "sphere" Sq = {z' G TqA : g + z' G 
bd(A)}. In this manner, (4.13) becomes: 

(4.41) -ZHq{x;X)^Lq{q + 0z)-lj2^(^^J2\ ^-r^^P^n 

With regards to the first term of (4.41), Lemma A. 2 in Appendix A and the 
definition (4.36) of n yield Lq{q + 0z) > ^m\\P{z)\\'^0^ > Imn^p^e'^. Moreover, we 
have already seen in the proof of Theorem 4.4 that the second term of (4.41) is 
bounded above: 

(4.42) ^E^^'Ej,,,-!.-^/^--^^^-'^'- 
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We are thus left to estimate the last term of (4.41). To that end, (4.15) gives: 

(4.43) ^ q^p {p^5|3^ - q^y) a}^ = cr'^ViriPt - Uir) < ■^p'^cr'^ 

where the last inequality stems from the bound yir{pi — Uir) 1^ \Pi (recall that 
l£ Vir l£ Pi) ■ Combining all of the above, we then get: 

(4.44) - LHq{x; A) > ImK^p^O^ - ^pXcr^e^ - IpXa^ = g{0), < 6* < 1. 

2 2 8 

. 2 

As a result, if A < |Ao where Aq = ^"^^^^j it is easy to see that the RHS of (4.44) 
will be increasing for < 6 < 1. Moreover, it will also be positive for 0\ < < 1, 
where 9x is the positive root of g: 9x ~ ^{Xq/X — 1)^^/^. 

So, pick some positive a < g{l) = \p(T^ (Aq — |A) and consider the set Ka = 
{q + Oz : z € Sq,g{6) < a}. By construction, Ka is a compact neighbourhood of q 
which does not intersect bd(A) and, by (4.44), we have ZHq{x\ A) < —a outside Ka- 
Therefore, if = denotes the hitting time = inf{t : X{t) G Ka}, Theorem 
5.3 in Durrett (1996, p. 268) yields: 

(4.45) E.M<^i^^<oo 

a 

for every interior initial condition X(0) = x G Int(A). 

Inspired by a trick of Imhof (2005), let us now consider the transformed process 
Y{t) = ^{X{t)) given by *,^(a;) = logx,^/a:,,o, M e A* = \{«<,o}. With ^ = 
and §|-^ = ~l/^Ci,o, Ito's formula gives: 

(4.46) dY,f, = C^^^iX) dt + dU,^ - dU.fl = L■^^^,{X) dt + Q^^a, dWr, 

where Q\.^ = P^^ ^ PrO ^''^ ^-^^^ components of the redundancy matrix Q of Q in 
the basis e^^ — e;^ — e^^o of T^A - recall also (2.14) and the relevant discussion in 
Section 2.2. 

We now claim that the generator of Y is elliptic. Indeed, if we drop the user 
index i for convenience and set Af^r ~ Qrfj.o'r, P G Ui-^ii suffices to show 
that the matrix AA^ is positive-definite. Sure enough, for any tangent vector z = 

^/^e^ S TqA, we get: 
(4.47) 

{Az, Az) = V {AA^) Z^Z^ = V V Qrf.Qru'^lz^Z^ = V a^"^ 



where w = Q{z). Since Q is irreducible, we will have w ^ 0, and in view of (4.47) 
above, this proves our assertion. 

We have thus shown that the process Y{t) hits a compact neighbourhood of 
^'(g) in finite time (on average), and also that the generator of Y is elliptic. From 
the criteria of Bhattacharya (1978, Lemma 3.4) it follows that Y is recurrent, and 
since is invertible in Int(A), the same must hold for X{t) as well. In a similar 
fashion, these criteria also ensure that the transition probabilities of the diffusion 
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X{t) converge in total variation to an invariant probability measure tt on A, thus 
proving the first part of our theorem. 

To obtain the estimate (4.38), note that Dynkin's formula (see e.g. 0ksendal, 
2007, Theorem 7.4.1) appHed to (4.44) yields: 



(4.48) 

1 



[ LHg{Xis);\)ds 
Jo 



elix{s))d.s 



and with 'Eij.[Hg{X{t); A)] > 0, we easily get: 



(4.49) 



E, 



Ql{X{s))ds 



< 



ll + -, where C = — ' 



We are thus left to establish the bound TT{Bg) > 1 — 9\/9^ which shows that 
the invariant measure tt concentrates its mass around the "projective balls" Bg. For 
that, we will use the ergodic property of X{t), namely that: 



(4.50) 



lim E^ 



XBe{X{s))ds 



where xbb is the indicator function of Bq. However, with 6^(a;)/6'^ > 1 outside Bg 
by definition, it easily follows that: 



(4.51) 



E, 



XBe{X{s))ds 



> E^ 



^*(i-e?W.))^ 



9-"] ds 



and the bound (4.39) follows by letting i oo in (4.49). 



□ 



We conclude this section with a few remarks on our results so far: 



Remark 1 {Learning vs. Noise). The nature of our bounds reveals a most 
interesting feature of the replicator equation (4.4). On the one hand, as A — >■ 0, we 
also get 9\ ^ and the invariant measure tt converges vaguely to a point mass at 
q. Hence, if the learning rate A is slow enough (or if the noise a is low enough), we 
recover Theorem 3.2 (as we should!). On the other hand, there is a clear downside 
to using very slow learning rates: the expected time to hit a neighbourhood of an 
equilibrium is inversely proportional to A. As a result, choosing learning rates is a 
delicate process and users will have to balance the rate versus the desired sharpness 
of their convergence. 

Remark 2 (The Effect of Redundancy). The irreducibility assumption is actu- 
ally quite important: it appears both in the "slow-learning" condition (4.37) (recall 
that ess{q) = if g is an interior point of a reducible network) and also in the proof 
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that the generator of Y{t) — 'ii{X{t)) is eUiptic. This shows that the stochastic dy- 
namics (4.4) are not obvlivious to redundant degrees of freedom, in stark contrast 
with the deterministic case (Theorem 3.2). 

Regardless, we expect that an analogue for Theorem 4.6 still holds for reducible 
networks if we replace q with the entire (affine) set A*. More precisely, we conjecture 
that under a suitably modified learning condition, the transition probabilities of 
X{t) converge to an invariant distribution which concentrates mass around A* (see 
Fig. 4(b)). One way to prove this claim would be to find a suitable way to "quotient 
out" ker Q but, since the replicator equation (4.4) is not invariant over the redundant 
fibres X + ker Q,x € A, we have not yet been able to do so. 

Remark 3 (Sharpness). We should also note here that the bounds we obtained 
are not the sharpest possible ones. For example, the learning condition (4.37) can 
be tightened and the assumption that 0^ > can actually be dropped. In that case 
however, the corresponding expressions would become significantly more compli- 
cated without adding adding much essence, so we have opted to keep our analysis 
focused on the simpler estimates. 

5. Discussion. In this last section, we will discuss some issues that have not 
been thoroughly addressed in the rest of the paper and provide some directions for 
future work. 

Learning and Optimality. We have already noted that the traffic flows which 
minimise the aggregate latency uj{x) = ^jPia;i(a;) in a network correspond pre- 
cisely to the Wardrop equilibria of a congestion model which is defined over the 
same network and whose delay functions are given by the "marginal latencies" 
4'%{yr) = (j^riUr) + Ur^l^'riyr) (see e.g. Roughgardcu and Tardos, 2002). Hence, if 
we set uj*^{x) = -Pra4>t(.yr) ^ud Substitute uj*^ instead of ujia in the replicator 
dynamics (3.5) and (4.4), our analysis yields: 

Theorem 5.1. Let d = £(Q,0) be a congestion model with strictly convex 
latency functions (pr, r € £, and assume that users follow a replicator learning 
scheme with cost functions uj*^. Then: 

L In the deterministic case (3.5), players converge to a traffic flow which min- 
imises the aggregate delay uj{x) = J^Pi'^ii^)- 

2. In the stochastic case (4-4)> tf ihe network is irreducible and the players' 
learning rates are slow enough, their time-averaged flows will be concentrated 
near the (necessarily unique) optimal distribution q which minimises u). 

Of course, for a sharper statement one need only reformulate Theorems 3.2, 4.3, 
and 4.6 accordingly (the convexity of (j)* replaces the monotonicity requirement 
for 4)r). The only thing worthy of note here is that the marginal costs (j>r{yr) do 
not really constitute "local information" that users can acquire simply by routing 
their traffic and recording the delays that they experience. However, the missing 
components yr4>'r{yr) can easily be measured by observers monitoring the edges of 
the network and could be subsequently publicised to all users that employ the edge 
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r g £. Consequently, if the adminstrators of a network wish users to figure out the 
optimal traffic allocation on their own, they simply have to go the (small) extra 
distance of providing such monitors on the network's links. 

Equilibrium Classes. In a certain sense, interior and strict equilibria represent 
the extreme ends of the Wardrop spectrum, so it was a reasonable choice to focus 
our analysis on them. Nevertheless, there are equilibrium classes that we did not 
consider: for instance, there are pure Wardrop equilibria which are not strict, or 
there could be "quasi-strict" equilibria q in the boundary of A with the property 
that LOiaiq) > '^iil) for all a which are not present in q. 

Strictly speaking, such equilibria are not covered by either Theorem 4.3 or The- 
orem 4.6. Still, by a suitable modification of our stochastic calculations, we may 
obtain similar convergence and stability results for these types of equilibria as well. 
For example, modulo a "slow-learning" condition similar to (4.37), it is easy to see 
that pure equilibria that are not strict are still stochastically stable. The reason we 
have opted not to consider all these special cases is that it would be too much trou- 
ble for little gain: the assortment of similar-looking results that we would obtain in 
this way would confuse things more than it would clarify them. 

Exponential Learning. In the context of A^-person Nash games, we have already 
mentioned that the replicator dynamics also arise as the result of an "exponential 
learning" process, itself a variant of logistic fictitious play (Fudenberg and Levine, 
1998; Mertikopoulos and Moustakas, 2009a). The way this scheme works is that 
players keep cumulative scores of their strategies' performance and they employ 
each strategy with a probability which is exponentially proportional to these scores. 
As such, it is not too hard to adapt this method directly to our congestion setting. 

In more detail, assume that all users i G 3^ keep performance scores Via of the 
paths at their disposal as specified by the differential equation: 

(5.1) dV.ait) ^ -uj,a{x{t)) dt, 

where x{t) is the traffic profile at time t. Based on these scores, the users then 
update their traffic fiows according to the Boltzmann distribution: 

gA,y.„(t) 

(5.2) x^Jt) = — — , 

where Xi denotes the learning rate of player i £ (this expression also explains 
why these rates can be seen as inverse temperatures). In this way, by decoupling 
these expressions, one obtains the deterministic replicator equation (3.5). 

We thus see that exponential learning tells us nothing new in deterministic en- 
vironments. In the presence of noise however, the scores Via also reflect any fluctu- 
ations in the observed delays, so we obtain instead: 

(5.3) dVia{t) = -LL>ia{X) dt + aia dUia, 

where, as in (4.3), dUia describes the total noise along the path a e Ai. Therefore, 
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if the users' flow profile X{t) is updated according to (5.2), Ito's lemma now gives: 



As far as the rationality properties of these new dynamics are concerned, a 
simple modification in the proof of Theorem 4.3 suffices to show that strict Wardrop 
equilibria are stochastically stable in (5.4). Just the same, the extra drift term in 

(5.4) complicates things considerably, so results containing explicit estimates of 
hitting times are significantly harder to obtain. Of course, this approach might well 
lead to improved convergence rates, but since the calculations would take us too 
far afield, we prefer to postpone this analysis for the future. 

The Brown-von Neumann- Nash Dynamics. Another powerful learning scheme is 
given by the Brown-von Neumann-Nash (BNN) dynamics (see e.g. Fudenberg and 
Levine, 1998) where users look at the "excess delays" 

(5.5) i^iaix) = [wi(x) -uj^a{x)X^ = max{wi(a;) -a;iQ(a;),0} 
and update their traffic fiows according to the differential equation: 



where '^i(x) — p^^"Y^^Xia'^ia(x). On the negative side, these dynamics require 
users to monitor delays even along paths that they do not employ. On the other 
hand, they satisfy the pleasant property of "non-complacency" (Sandholm, 2001): 
the stationary states of (5.6) coincide with the game's Wardrop equilibria and every 
solution trajectory converges to a connected set of such equilibria. 

In terms of convergence to a Wardrop equilibrium. Theorem 3.2 shows that the 
replicator dynamics behave at least as well as the BNN dynamics (except perhaps 
on the boundary of A), so there is no real reason to pick the more complicated 
expressions (5.5), (5.6). However, this might not be true in the presence of stochastic 
fluctuations: in fact, virtually nothing is known about the behaviour of the BNN 
dynamics in stochastic environments so this question alone makes pursuing this 
direction a worthwhile project. 

Acknowledgements. We would like to extend our gratitude to M. Scarsini 
for a series of fruitful discussions on Braess's paradox which helped us clarify the 
differences between the equilibrial conditions that arise in congestion models. 

APPENDIX A: PROPERTIES OF THE ADJOINT POTENTIAL 
We collect here some of the most useful properties of the adjoint potential: 



dXia = XiXia [uJi{X) — U!ia{X)] dt + XtXia dUia — Pi ^ ^ XijS dUip 




(5.6) 



dxi, 



i^iaixit)) - ^/Ji{x{t)), 



dt 



(3.9) L,ix)^Y.- (■ 



qia)uJta{x) ^ ^ iVr- yr)(t>r{yr) = A(y). 



36 



P. MERTIKOPOULOS AND A. L. MOUSTAKAS 



To begin with, the equahty in (3.9) stems from the invariance identity: 

(A.l) ZaUJa = ZaPrafl^r = Wr^r, Z eV 

^^^^ Gi ^^^^ Oi ^^^^ T" ^^^^ 'V 

where P -.V \& the indicator matrix of the network Q and w = P{,z). It is then 

easy to verify that Lq(x) = Lg(x') whenever x' — x d kerQ (thus justifying (3.9)), 
and also that Lg = Lg/ iS q' — q € kerQ. As a result, the notation Lq{x) = A(y) is 
consistent with any choice of q £ A* . 

This "adjoint" potential owes its name to the formula for integration by parts: 

(A.2) ^ / (l)r{w) dw = ^ iVr - y*)(l>r{yr) / U;^^^") ^W. 

Since the latencies (jjr are increasing, this expression immediately yields the estimate 
(3.9): $(?/) — ^{y*) < A{y)- However, if we also assume that q is strict (say q = 
Pi^ifi for convenience), we can get a more direct bound: 

Lemma A.l. Let q ~ Pi&i.o be a strict Wardrop equilibrium and let z G TgA. 
Then, for all t > such that x{t) = q + tz (z A, we have: 

(A. 3) Lg{q + tz)>^ Aoji\\z.i\\it, where Auji = min^^o{'^ip('?) - ^i,o('?)} • 

Proof. Clearly, to have q + tz E A for some t > 0. z must be of the form: 
(A.4) 

z = Zi with Zi = ZiJcin - Cifi) and Zia > for all fi e A* = Ai \{0}. 

^ — ^ — ^ ^ 

So, let f{t) = ^{y{t)) where y(t) = P{x{t)) ^ y* + tw and w = P{z). With $ 
convex, we get f{t) > /(O) + f'{0)t and a simple differentiation yields: /'(O) = 
li\t=o^r^riyr + tWr) = J2r "'^r^'riVr) ■ Howcvcr, thanks to (A.l) and (A.4), we 
may rewrite this sum as: 

(A.5) WrCpriyl) = ZiaUJ^a{q) , [^iM) ~ 

because ||zi||i = Yl,a l^^^l = |^ ^■h\ + I^vI = ^X;^ Zi^- We thus obtain: 

(A.6) L,((? + <z)=A(y*+H>/(i)-/(0)> ° 

This lemma shows that Lg increases at least linearly along all "inward" rays 
q + tz.^ This is not so if q is an interior equilibrium: 



^It is interesting to note here the relation with Proposition 2.7: if the ray q + tz is inward- 
pointing, tlien z cannot be "redundant", i.e. we cannot have z £ ker P. 
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Lemma A. 2. Let q G Int(A) be an interior Wardrop equilibrium and let z G 
TqA. Then, for all t > such that x{t) = q + tz Cz A, we have: 

(A.7) Lg{q + tz)> ^m\\P{z)\\'^t'^, where m ^ M{(l)'^{yr) : r e £,y E P{A)}. 

Proof. Following the proof of Lemma A.l above, we obtain: 
(A.8) /'(O) = V V' z,^Lj,^{q) = V. V' z^^ljM = 

^ — ^ — ^ — ^'i ^ — 'a 

where the second equality follows from the fact that q is an interior equilibrium 
(that is, ujia{q) — uji[q) for all paths a £ Ai), and the last one is a consequence of 
z being tangent to A (meaning that = 0). On the other hand, we also get: 

(P 

(A.9) 



Clearly, since the set P{A) of load profiles y is compact and the (continuous) 
functions (j>[. are positive, we will also have m — mi{(p'^{yr) : y G P{A), r £ £} > 0. 
We will thus have f{t) > ^mt^, and a first order Taylor expansion with Lagrange 
remainder easily yields: 

(A.IO) Lg{q + tz)=A{y*+tw)>fit)~fiO)>^m\\P{z)\\H\ □ 

APPENDIX B: STOCHASTIC CALCULATIONS 

This appendix is devoted to the calculations that are hidden under the hood of 
(4.13), the equation that describes the evolution of the relative entropy Hg{x; A). 

Proof of Lemma 4.1. Let Vq{t) = Hq{X{t)]\). We then have: 



(B.l) 



dHq 

a dXia 



dX. 



1 qi, 



IE 



— -^-2~ {dXia) 



{dX,^)-{dXjp) 



'i,a \ Xia 2 Ai X.^^ 

However, with X{t) being as in (4.4), we readily obtain 



(B.2) {dX^^f = XfXl ( dU,^ - pT^ J2\ 



A, Xf, 



(dU^a.) 



1 



XijsXij dUiB ■ dU, 



03- 



— A, Xi, 



2 ^ 2 2 



p, — /3 "-^ Pi ^/3,7 

As a result, we may combine the two equations (B.l) and (B.2) to obtain 
(B.3) 

^^9 = ^y^ qia[i^i{X) - UJia{X)]dt~'S^ qia rfJ/^Q - p~ ^ ^ dJ7, 
f ^ 7. . rv f ^ 'I. . r\ ' ^ p 



dt. 



Hp 



dt. 
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(B.4) 



Therefore, if we focus at a particular user i G the last term of (B.3) gives: 
2 2 — 2 1 — ^ ^ 2 



j2 



1 

P^ 



and the lemma follows by substituting (B.4) into (B.3) and keeping only the result- 
ing drift, that is. the first and third terms of (B.3). □ 
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